专利摘要:
  FILE SEPARATION BASED ON CONTENT The present invention relates to methods, systems and apparatus including computer programs encoded in a computer storage medium, for transferring electronic data. In general, an aspect of the present subject described in that specification can be incorporated into methods (300, 400, 700) that include the actions of identifying a data item to be divided; determining the type of data item; determining whether the type of data item is one of one or more specified types; if it is determined that the type of data item is not one of the one or more types specified, performing a first split of the data item; and if it is determined that the type of data item is one of one or more types specified, performing a second division of the data item that is based on the particular content parts of the data item.
公开号:BR112013017971A2
申请号:R112013017971-6
申请日:2012-01-13
公开日:2020-10-27
发明作者:Dominic B. Giampaolo;James L. Mensch;Cameron Stuart Birse;Ronnie G. Misra;Eric Olaf Carlson
申请人:Apple Inc.;
IPC主号:
专利说明:

- Invention Patent Descriptive Report for "CONTENT BASED FILE SEPARATION", Background The present invention relates to the transmission and storage of electronic data.
Data items, for example, files, are often transferred to different devices. For example, they can be shared with other devices (for example, non-hierarchical devices) or transferred to a server or other storage device (for example, as a support or remote storage of data item). The transfer of large data items can consume network resources in addition to creating problems when a transfer is interrupted before its completion. Consequently, conventional systems typically break large data items into pieces before transmitting the data items to a destination device. Summary This specification describes technologies for the transfer and storage of electronic data.
In general, an aspect of the present subject described in that specification can be incorporated into methods that include the actions of identifying a data item to be divided; determining the type of data item; determining whether the type of data item is not specified in one or more types, performing a first division of the data item; and it is determined that the data item type is one of one or more specified data types, the realization of a second data item split that is based on particular pieces of content of the data item. Other modalities of this aspect include corresponding systems, devices, and computer programs recorded on the computer storage devices, each configured to perform the operations of the methods.
These and other modalities may each optionally include one or more of the following characteristics. The second
”Division includes: introspection of the data item; the generation of a data map of pieces of content within the data item based on introspection, and the division of the data item based on the data map. The realization of the second division includes the use of the generated data map to define content based on division limits. Data map generation includes identifying different types of content within the data item. Identifying a data item type includes identifying a file extension associated with the data item. The division of the data item based on the data map includes - 10 separate division of different types of content. The method additionally includes sending pieces to a destination. The method additionally includes the encryption of each piece before sending. The claim method includes additionally, in response to receiving a request for the data item; sending a list of pieces, each piece having a piece identifier, for the applicant; receiving a request for one or more pieces of data item from the list of pieces, the one or more pieces requested being pieces that have been changed from a previous version of data item; and sending one or more requested orders.
In general, an aspect of the present subject described in that specification can be incorporated into methods that include the actions of receiving a data item to be divided; identification of a type associated with the data item; the use of the identified type to introspect the data of the data item and build content based on the map of the data item; and the use of the map based on content to identify a separate division to be performed for different parts of content in the data item; and dividing the data item based on the content in the data item. Other modalities of this aspect include systems, devices, and corresponding computer programs recorded on computer storage devices, each configured to perform method operations.
These and other modalities may include each optionally
. one or more of the following characteristics. The construction of the map with Bse in content includes the identification of different types of content within the data item. Data item splitting includes performing separate splitting operations on one or more types of content identified within the data item.
The particular modalities of the present subject described in this specification can be implemented in order to achieve one or more of the following advantages. Content-based division of data items allows parts of data items that remain unchanged to keep the same pieces. As a result, the number of pieces - stored and transmitted can be reduced so that processing and usage costs network are also reduced. 7 The details of one or more modalities of the present matter described in that specification are presented in the attached drawings and the description below. Other characteristics, aspects and advantages of this material will be apparent from the description, drawings and claims. Brief Description of the Drawings Figure 1 is a block diagram of an illustrative non-hierarchical system for transferring data items in pieces; Figure 2 is a block diagram of an illustrative client server system for transferring the data items in pieces; Figure 3 is a flow chart of an illustrative process for transferring data items into pieces; Figure 4 is a flow chart of an illustrative process for content-based viewing; Figure 5 illustrates illustrative content based on the mapping of a presentation file; Figure 6 illustrates illustrative content based on the mapping of an audio file; Figure 7 is a flow chart of an illustrative process for supplying pieces in response to a request for a data item; Figure 8 illustrates an illustrative architecture of a system.
.. Similar numerical references and designations in various drawings indicate similar elements.
Detailed Description Data items are divided into two or more pieces for transmission, for example, to the same devices or other destinations (for example, a remote storage device or a support server). Data items having particular types can be divided into chunks based on the content of the data items. Other data items that are not of the particular types can be divided into pieces according to a division process that does not use the content of the - data items.
: Content-based splitting can increase the amount of data that remains unchanged between versions. The pieces of a later version of a data item can be compared with pieces of a previous version of the data item stored at the destination. Only those pieces that have been altered are then transmitted to the destination. This reduces duplication of the content of versions of transmitted and / or stored data items, referred to in this specification as deduplication.
Figures 1 and 2 illustrate block diagrams of illusory systems where the data items in pieces can be transmitted. Figure 1 is a block diagram of an illustrative non-hierarchical system 100 for transferring data items into pieces. The non-hierarchical system 100 includes device 102 and non-hierarchical devices 104, 106e108 coupled through a network 110. Device 102 and non-hierarchical devices 104, 106 and 108 can be various types of computing devices including, but not limited to, desktop computers, laptop computers, tablet devices, mobile devices, personal data assistants, etc., Network 110 can be part of a local area network, large network or the Internet, device 102 is illustrated including data 112 and data divider 114. Data 112 includes various types of data items stored
- on the device, for example, e-mail, photographs, documents, etc. Data divider 114 can divide one or more types of data items stored in data 112, as described in greater detail below with respect to Figures 3 and 4. Divided data items can be transmitted from device 102 to one or more types of data items stored in data 112, as described in greater detail below with respect to Figures 3 and 4. Split data items can be transmitted from device 102 to one or more of the non-hie devices - scanned 104, 106 and 108, for example, for synchronization (for example, electronic mail) or for transferring data items in particular - (for example, a private presentation document). A given non-hierarchical device can combine the pieces to reform the data item or can store the pieces until needed. Figure 2 is a block diagram of an illustrative client server system 200 for transmitting split data items. The client server system 200 includes client device 202, client device 204, server 206 and network 208. Client devices 202 and 204 can be various types of computing devices including, but not limited to desktop computers, laptop computers, tablet devices, mobile devices, personal data assistants, etc. The 208 network can be part of a local area network, wide area network, or the Internet.
Client device 202 includes data 210 and data divider 212. Similarly, client device 204 includes data 218 and a data divider 220. Data dividers 212 and 220 can split one or more types of items of data stored in data 112, as described in greater detail below with respect to Figures 3 and 4. Client devices 202 and 204 can be associated with a single individual (for example, a user's desktop or mobile device at private) or may belong to different users. For example, two users can use the same support storage server. Server 206 includes stored data 214 received from client devices 202 and 204. Server 206 can receive data items for storage as chunks from client devices 202 and
204. In some implementations, data items are stored as pieces, for example, in a pieces store and are reconstituted when necessary, for example, when opened by a non-hierarchical device in which pieces are stored. In some other implementations, the stored pieces are collected as needed, for example, when the data item is requested or when only that part of the data item is needed. In some other additional implementations, the pieces are reconstituted in the respective data items for storage as stored data 214. In some implementations, server 206 includes a data divider 216. Server 206] can split the data items requested on server 206 before distributing them, for example, to a requesting client device.
Figure 3 is a flow chart of an illustrative process 300 for transferring data items in pieces. Process 300 may be performed, for example, by a system including one or more computing devices, for example, by one or more computers, mobile devices, tablet devices or servers.
A data item to be transmitted is identified (step 302). The data item to be transmitted can be identified, for example, in preparation for sending data items to a particular recipient. For example, a request for a data item can be received from another device. In another example, the data item can be sent to another device for storage, for example, according to a schedule or other criteria. The data item can be a file, folder, or other data. For example, the data item can be a document, a media file (for example, image, audio or video), electronic mail, or any other type of file data.
A determination is made as to whether the data item has a specified type (step 304). The type of data item can be identified, for example, using the data item file extension.
- In some other configurations, the data type can be identified using other data, for example, file headers, magic numbers, or other data patterns indicative of a particular type of data item. The type of data item can be compared with a list of specified types to determine whether a combination exists. The list of specified types can be generated, for example, according to the types in which a particular splitting process has been generated. For example, if a content-based split process is generated for an audio file. MP3 (that is, an audio file encoded in the Layer 3 - 10 MPEG-1 or MPEG-2 audio format), this type is added to the type list.
: If it is determined that the file is not one of the specified types, a first type of division is performed on the data item (step 306). The first type of division is based on the lengths of data and not on the particular content part of the data item. Various splitting techniques can be used that divide the data item into a series of pieces based on a particular piece size and the amount of data to be divided. The first type of division can be fixed or variable in length based on the size of the data item. Fixed-length splitting generates pieces of a fixed size while variable-length splitting allows piece sizes within a range according to the specified splitting criteria. A conventional illustrative sequence for the first type of division includes first determining whether the amount of data to be divided is greater than a minimum chunk size. If the number of fingers is not greater than a minimum chunk size, then the split is not necessary. If the amount of data to be divided is greater than the minimum chunk size, then the ends of the chunk are identified for each piece of data. This may include performing one or more checksum operations (for example, a rolling checksum such as Rabin checksum) to define a particular number of data item bytes to divide (for example, 40 kilobytes ). In particular, when not using a piece of size
- fixed, but allowing some piece size range, the piece ends are identified starting with a minimum piece size and increasing towards the maximum piece size depending on the results of the bearing checksum. In this way, each piece of size can be within the range of minimum piece <piece <maximum piece.
Each piece then receives a particular piece identifier, for example, from a hash of the piece data. In particular, in some implementations, a secure hash algorithm (SHA) is used to generate the chunk identifier. The process is repeated to generate subsequent hot chunks from the remaining data of the data item until the data left to be divided is less than the minimum chunk size. 7 The pieces are transmitted to a destination (step 312). The pieces can be transmitted as they are generated, after some specified delay, or after all pieces of data item have been generated. In some implementations, network bandwidth is determined and used as a factor in determining when and at what rate to send the pieces.
If the data item is of the specified type, the data item is introspected to identify the particular pieces of data within the data item (step 308). For example, for a presentation file (for example, a slide show), introspection can identify parts of the file corresponding to different types of content, for example, a slide index, images, and slide text. The identified pieces of content can then be used to perform the division based on content in the data item (step 310).
The content-based division is described in greater detail below with reference to Figure 4. As with the first type of division, the pieces generated from the content-based division are transmitted to the destination (step 312).
Figure 4 is a flow chart of an illustrative process 400 for content-based division. Process 400 can be performed, for
- for example, by a system including one or more computing devices, for example, by one or more computers, mobile devices or servers.
A data item to be divided is identified (step 402).
The amount of data to be divided can be identified, for example, based on a request for a particular data item from another device or a command to send a particular data item to another device (for example, media, storage remote, or the same) as described above with respect to Figure 3. The data item can be identifiable 10 for division if it exceeds a specified size. The expected size. can be equal to or greater than a room size. A determination is made in which the data item must have a division based on content performed (step 404). The determination that content-based splitting should be performed can be based on combining one type of data item with one of a group of specified types. In particular, the type of data item can be identified and compared with a list from which the content-based division is available.
The data item type is used for introspection of the data item (step 406). Introspection of the data item allows the identification of different types of content within the data item. For example, a slide show can include a slide index, images, text, etc. Similarly, an audio file includes metadata indicators in addition to the audio data (for example, a song or other audio content).
A content-based map is constructed for the data item based on introspection (step 408). In particular, the parts of the data item corresponding to different types of content (different pieces of content) are identified in order. In this way, for example, instead of simply identifying an audio data item, the parts of the audio file corresponding to the audio data and the parts corresponding to the indicators can be identified separately.
- The content-based map is used to split the data item (step 410). In particular, the content-based map can be used to identify the pieces of content in the data item that are not likely to change between versions of the data item. For example, for an audio file, the indicators may change each time the audio is played. For example, a play counter can be incremented or a last play date can be updated. However, the data corresponding to the audio content itself (for example, the music itself) will typically remain unchanged. The content-based division can divide different types of content within. of a data item separately so that the content is unlikely to change between versions or used will have the same pieces 7 each time the data item is divided.
The type of processing or splitting performed may depend on whether or not the content is likely to change. For example, the boundaries of division (for example, boundaries through which a piece cannot stretch) can be established between different types of content in the data item. In addition, the type of division performed may change depending on the particular content. For example, images in a presentation file can be divided into pieces of fixed size. In particular, each image can be divided separately. In addition, image data and indicator data can be divided separately. In particular, both image data and indicator data can be divided separately, for example, into pieces of fixed size. The pieces of fixed size are each the same size up to the last piece, which may be smaller depending on the remaining data. For example, an item with a size of 1.6 megabytes (MB) and a fixed piece size of 1 MB will result in two pieces, a first piece having 1 MB and a second piece of 0.6 MB. Since the image is not expected to change, these fixed-size pieces are unlikely to change when the data item is divided again. Due to chunk limits, additional data (for example,
- example, data following the image in the presentation file) will not be added from other content to fill the piece of fixed size. That way, even if the pieces in one piece of content change in a subsequent version, they will not cross over to other pieces of content by changing those pieces as well.
In another example, the parts that are most likely to change can be divided using a more variable chunk size (for example, as described above with respect to Figure 3) in an attempt to reduce the number of chunks that will change to a certain modification. That way, a change in the play counter to one. audio file will not necessarily change all the pieces within the indicator part of the audio data item. 7 Each piece is assigned a piece identifier, for example, according to the hash function applied to each piece generated as described above.
The divided data is transmitted to a destination (step 412). For example, the pieces can be transmitted to a requesting device or to a storage server. In some implementations, only some of the pieces are transmitted. For example, a request for a later version of a data item may lead to only those pieces needing to be changed from a previous version being transmitted, as described in greater detail below with respect to Figure 7. For example, if an audio-type data item has changed only the pieces associated with the indicators, the pieces for the audio data, which make up most of the data item, do not need to be transmitted.
In some implementations, content-based division can be performed recursively depending on the content. In particular, when a data item is a container containing one or more additional types of content (for example, MS text, zip, jar, etc.), introspection identifies the boundaries of the embedded content or content and then identifies a or more corresponding content-based divisions for each type of
- embedded content. For example, this allows the reorganization of an image embedded in another file format and division application based on content suitable (for example, for images) to that part of the file. Figure 5 illustrates a mapping based on illustrative content 500 of a presentation file and division parts based on corresponding content. In particular, the example in Figure 5 is a presentation data item (for example, a slide show). Introspection has identified different content arranged as illustrated in the mapping based on 500 content. In the presentation file. illustrative illustrated in Figure 5, a slide index is followed by the text of the slide, an image, an annotation and two or more images. The data item parts to be divided together are identified based on content-based mapping.
For example, as illustrated in Figure 5, the slide index and slide text can be divided together, for example, as pieces of variable length (for example, as described above with respect to Figure 3). The dividing part is limited by the first image in the presentation file so that the last piece does not cross into the image, this allows the different pieces to be used for different types of content so that the unchanged content can retain the even pieces through versions. Chunk deduplication can result when sending data item to a destination since only those chunks that have changed need to be shipped.
The images are illustrated as having fixed size pieces. Each image is divided separately. Additionally, for each image, the image data can be divided separately from any image indicator data. For the sake of illustration, Figure 5 illustrates an illustrative image by separating the image indicator data from the image data. Additionally, in some implementations, indicator data can be divided into pieces of variable length instead of pieces of fixed size as illustrated.
- In some implementations, the division is further refined within the particular content. For example, if the image is a JPEG image, type-specific content based on division can be realized that includes a small fixed size piece for a part of the image's metadata and a different fixed size division for those of the - remaining image data.
Figure 6 illustrates a mapping based on 600 illustrative content of an audio file. Content-based mapping includes a piece of music interspersed between two parts of indicators.
. 10 Each of these parts can be divided separately, for example: using pieces of fixed size. That way, if an indicator is modified or added in a later version, only the pieces in the] part of the indicators will be changed. In contrast, using a division technique that is not based on content, for example, as described with In relation to Figure 3 below, it can lead to few pieces remaining unchanged. For example, a modification or addition of indicators to the first part of indicators at the beginning of the audio data item may result in a change of | piece boundaries throughout the data item. In particular, since there is no division limit, a piece can cross to include indicator data and music data. As a result, a request for a new version may require a complete set of chunks being sent, as an early change to a generated chunk (for example, a chunk boundary modified due to added data) can be propagated throughout of data, resulting in little or no chunk deduplication.
In some implementations, encrypted chunks are generated. For example, convergent encryption can be used. Converged cryptography allows the piece to be stored in a non-secure way. Individual pieces having a piece identifier can be encrypted with a key corresponding to the piece's hash identifier, for example, ASE encryption. A new chunk identifier
. can be generated for each chunk by hashing the encrypted chunk. Other encryption schemes can be used alternatively. For example, a shared key or private / public key encryption scheme can be used.
Figure 7 is a flow chart of an illustrative process 700 for delivering pieces in response to a request for a data item. A request for a new version of data item is received (step 702). For example, during a synchronization process, a later version of a data item can be identified as not present on a non-hierarchical device. The non-hierarchical device can then request a later version of the data item. Alternatively, a computing device can request all data items changed (for example, since the last backup or storage event) to upload to a backup server The data item is divided (step 704). The data item is divided as described above depending on the type of data item. In some implementations, content-based splitting is performed on the data item as described with respect to Figures 4 to 6. A chunk list (for example, according to the chunk identifier or encrypted chunk identifier ) is generated (step 706). For example, each piece can have an identifier according to an applied hash function. This way, if the data for a particular chunk is not changed from an earlier version, the hash is not changed either. In some implementations, the chunk list is sent to a requesting device, for example, a non-hierarchical device that requested the data item. In some other implementations, the chunk list is submitted to a backup server or another device. to which the data item (or parts thereof) will be sent. The list of pieces can be compared with those already present in the ordering device. A request for one or more pieces of the list is received (step 708). For example, a device can send a request only for new pieces of data item and not for those that remain the same and are already available on the device. In this way, the redundant pieces do not need to be transmitted and duplicated in the storage device. The requested pieces are sent (step 710). Unsent pieces can be stored or disposed of.
In some implementations, stored pieces can be shared between multiple users. For example, a particular audio file may include indicator information that is unique to each user (for example, play counter), but the audio content will remain the same. This way, if multiple users store the same audio file in a remote storage location, it is not necessary to store multiple copies of the common audio data.
In particular, for users applying the same 'content splitting' technique, the piece identifiers for the audio part of the audio file must match for each user. In this way, when sending the list of pieces to the remote storage location, common pieces can be identified from another user who has already stored the audio file. As a result, only the pieces unique to the user (for example, for indicator information) need to be transmitted for storage.
In some implementations, the pieces can be collected as needed. For example, when sequencing data from another location, pieces of sequencing data can be collected as needed. For example, if the data item is a movie file, the pieces can be collected only as needed, for example, reproduction procedures. The pieces can be ordered sequentially or at random, but not all pieces need to be transmitted at once. Another example is a client that indexes data. For example, a music playback application may collect audio file indicator data only to generate an index of all files, but does not require actual audio data (for example, pieces corresponding to audio data ) until the files are actually played.
- In some implementations, due to policy, licensing terms, etc., it may be necessary to require each user to transmit all pieces of particular types of data items to the remote storage location to establish proof that the user actually has the data in question.
However, the remote storage location can still retain only a single copy of the pieces since it sounds known as identical, for example, since they have the same identifier.
Figure 8 illustrates an illustrative architecture of a 800 system. The 800 system architecture is capable of carrying out the realization operations: 10 of division with Bse in content of the data items.
The 800 architecture includes: zero or more 802 processors (for example, IBM PowerPC, Intel Pentium 4, ARM, etc.), zero or more 804 display devices (for example, CRT, LCD), zero or more graphics processing units 806 (for example, NVIDIA, GeForce, etc.), zero or more 808 network interfaces (for example, Ethernet, FireWire, USB, etc.), zero or more 810 input devices (for example, keyboards, mouse, etc. .), and zero or more 812 computer-readable media. These components exchange communications and data using one or more 814 buses (for example, EISA, PCI, PCI Express, etc.). In some implementations, some remote and / or division storage systems may not include display devices or peripherals.
In addition, the pieces can be stored on a network or remote storage devices that interact with one or more other systems to process and store the pieces of data.
The term "computer-readable medium" refers to any medium that participates in providing instructions for an 802 processor to run. The computer-readable medium 812 additionally includes the 816 operating system (eg MAC OS ”, ioS” , Windows ", Linux, etc.), an 818 network communication module, a splitter based on 822 content, and other 824 applications. The 816 operating system can be multi-user, multi-processing, multitasking, multi-sequence, in real time and the like.
The 816 operating system performs basic tasks, including, but not limited to: registration recognition from 810 input devices; sending results to 804 display devices; maintaining file and directory tracking on computer-readable media 812 (for example, memory or a storage device); control of peripheral devices (for example, hard drives, printers, etc.); and traffic management on one or more buses 814, The network communications module 818 includes several components for establishing and maintaining network connections (for example, software for implementing communication protocols, such as TCP / IP, HTTP, Ethernet, etc.).
The 822 content-based divider provides several components. software agents for performing various functions for performing division based on content as described in relation to Figures 1 to 7 'The modalities of the present matter and the operations described in this specification can be implemented in a set of digital electronic circuits, or in computer software, firmware or hardware, including the structures described in that specification and their structural equivalences, or in combinations of one or more of them. The modalities of the present matter described in that specification can be implemented as one or more computer programs, that is, one or more modules of computer program instructions, encoded in a computer storage medium for execution by, or to control the operation of a data processing device. Alternatively or additionally, the program instructions can be encoded in an artificially generated propagated signal, for example, an electrical, optical or electromagnetic signal generated by the machine, which is generated to encode the information for transmission to the appropriate receiver device for execution by a data processing device. A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory set or device, or a combination of a or more of them. Furthermore, while a computer storage medium is not a propagated signal, a
. Computer storage medium can be a source or destination of computer program instructions encoded in an artificially generated propagated signal. The computer storage medium can also be, or can be included in, one or more separate physical components or media (for example, multiple CDs, floppy disks or other storage devices).
The operations described in this specification can be implemented as operations performed by a data processing device on data stored on one or more of the computer-readable storage devices or received from other sources.
. The term "data processing apparatus" encompasses all types of apparatus, devices and machines for data processing, including, for example, a programmable processor, a computer, a system on a chip, or multiples thereof, or combinations of the above. The device may include a special-purpose logic circuit assembly, for example, an FPGA (field programmable port assembly) or an ASIC (application-specific integrated circuit). The device may also include, in addition to the hardware, code that creates an execution environment for the computer program in question, for example, code that constitutes processor firmware, a protocol stack, a Base Management System database, an operating system, a cross-platform uptime environment, a virtual machine, or a combination of one or more of them. The apparatus and the execution environment can realize several different computing model infrastructures, such as network services, distributed computing and grid computing infrastructures.
A computer program (also known as a program, software, software application, script or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and can be developed in any form, including as an independent program or as a module, component, subroutine, object or other
'unit suitable for use in a computing environment. A computer program can, but does not have to correspond to a file on a file system. A program can be stored in a part of a file that holds other programs or data (for example, one or more scripts stored without a document. markup language), in a single file dedicated to the program in question, or in multiple coordinated files (for example, files that store one or more modules, subprograms, or pieces of code). A computer program can be developed to run on one computer or on multiple computers that are located in one location or distributed across multiple locations and interconnected over a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors running one or more computer programs to perform actions by operating input data and generating output. Logical processes and flows can also be performed by, and the device can also be implemented as, a special purpose logic circuit set, for example, an FPGA (field programmable port set) or an ASIC (integrated circuit) application-specific).
Suitable processors for running a computer program include, for example, both general and special purpose microprocessors, and any one or more processors of any type of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor to perform the actions according to the instructions and one or more memory devices to store instructions and data. Generally, a computer will also include, or be operationally coupled to receive data from or transfer data to, or both, one or more mass storage devices for data storage, for example, magnetic disks, magneto-optical disks, or optical discs. However, a computer does not need to have such devices. THE-
and more, a computer can be embedded in another device, for example, a mobile phone, a personal digital assistant (PDA), a mobile audio or video device, a game console, a Positioning System receiver Global (GPS), or a portable storage device (for example, a universal serial bus (USB) flash trigger), to name just a few.
Suitable devices for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including, for example, semiconductor memory devices, for example, E-PROM, EEPROM, and flash memory devices; magnetic disks, for example, internal hard drives or removable disks; magneto-optical discs, and CD-ROM and DVD-ROM discs.
The processor and memory can be 'supplemented by, or incorporated into a special purpose logic circuitry
In some implementations in which the user interacts directly with a system, the modalities of the present matter described in this specification can be implemented on a computer having a display device, for example, a CRT monitor (all with a close radius). ) or LCD (liquid crystal display), for displaying information to the user and a keyboard and pointing device, for example, a mouse or TrackBall, by which the user can provide registration to the computer.
Other types of devices can be used to provide interaction with a user as well; for example, the feedback provided to the user can be any form of sensory feedback, for example, visual feedback, auditory feedback, or tactile feedback: and the user record can be received in any form, including acoustic, speech, or tactile feedback. .
Additionally, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's client device in response to requests received from the web browser.
The modalities of the present matter described in that specification
. can be implemented in a computing system that includes a back end component, for example, as a data server, or that includes a middleware component, for example, an application server, or that includes a front end, for example, a client computer having a graphical user interface or a web browser through which a user can interact with an implementation of the present matter described in that specification, or any combination of one or more such as components rear end, middleware or front end. The system components can be interconnected by any form or means of digital data communication, for example, a communication network. Examples of 'communication networks include a local area network ("LAN"), and a wide area network ("WAN"), an inter-network (for example, the Internet) and non-hierarchical networks (for example, ad hoc non-hierarchical networks).
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communications network. The relationship between the client and the server arises because of computer programs running on the respective computers and having a client and server relationship as one another. In some embodiments, a server transmits data (for example, an HTML page) to a client device (for example, for the purpose of displaying data to and receiving a user's user record interacting with the client device). The data generated on the client device (for example, a result of user interaction) can be received from the client device on the server.
While this specification contains many specific implementation details, they should not be considered limitations on the scope of any inventions or what can be claimed, but rather as descriptions of the specific features for particular modalities of the particular inventions. Certain characteristics that are described in that specification in the context of separate modalities can also be implemented in combination in a single mode.
dality. Conversely, several characteristics that are described in the context of a single modality can also be implemented in multiple modalities separately or in any suitable subcombination. Furthermore, although the characteristics can be described above as appearing in certain combinations and even initially claimed as such, one or more characteristics of a claimed combination can in some cases be removed from the combination, and the claimed combination it can be directed to a subcombination or variation of a subcombination. Similarly, while operations are presented in the BR drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular illustrated or sequential order, or that all illustrated operations are performed , to achieve the desired results. In certain circumstances, multitasking and parallel processing can be advantageous. In addition, the separation of various system components in the modalities described above should not be understood as requiring such separation in all modalities, and it should be understood that the program components described and systems can generally be integrated together in a single software product or packaged in multiple software products.
Thus, the particular modalities of the present matter have been described. Other modalities are within the scope of the following claims. In some cases, the actions recited in the claims may be carried out in a different order and still achieve the desired results. In addition, the processes presented in the attached figures do not necessarily require the particular illustrated order, or sequential order, to achieve the desired results. In certain implementations, multitasking and parallel processing can be advantageous.
权利要求:
Claims (10)
[1]
1. Method performed by the data processing apparatus, the method including: the identification of a data item to be divided; determining the type of data item; determining whether the data item type is one of one or more specified types; in response to the determination that the type of data item is not one of one or more specified types, the realization of a first - 10 data item dialog; and in response to the determination of what the data item type is - one of one or more specified types, the realization of a second 'data item division is based on the particular pieces of content of the: data item.
[2]
2. Method according to claim 1, in which the realization of the second division includes: introspection of the data item; generating a data map of pieces of content within the data item based on introspection; and splitting the data item based on the data map.
[3]
Method according to claim 2, in which the embodiment. of the second division includes using the generated data map to define content boundary boundaries.
[4]
4. Method according to claim 2, in which the generation of data map includes the identification of different types of content within the data item.
[5]
5. Method according to claim 4, in which the identification of a type of data item includes the identification of a file extension associated with the data item.
[6]
B. Method according to claim 4, in which the data item division based on the data map includes the separate division of different types of content.
[7]
the 7. Method, according to claim 2, additionally including: sending pieces to a destination.
[8]
8. Method, according to claim 6, additionally comprising: the encryption of each piece before shipping.
[9]
9. Method, according to claim 1, additionally comprising: in response to receipt of a request by data item - 10 of,. sending a list of pieces, each piece having a piece identifier, to the applicant; : receipt of a request for one or more pieces of i-. has data from the list of pieces, the one or more pieces requested being pieces that have been changed from a previous version of the data item; and sending the requested piece or pieces,
[10]
10. System, including: means for identifying a data item to be divided; means for determining the type of data item; means for determining whether the type of data item is one of one or more specified types; in response to the determination that the type of data item is not one of the one or more types specified, means for carrying out a first division of the data item; and in response to the determination that the type of data item is one of one or more specified types, means for carrying out a second division of the data item which is based on the particular pieces of content of the data item.
类似技术:
公开号 | 公开日 | 专利标题
BR112013017971A2|2020-10-27|file separation based on content
Zhao et al.2013|Liquid: A scalable deduplication file system for virtual machine images
JP5932024B2|2016-06-08|Reference count propagation
US9792344B2|2017-10-17|Asynchronous namespace maintenance
US9110603B2|2015-08-18|Identifying modified chunks in a data set for storage
Liu et al.2020|A low-cost multi-failure resilient replication scheme for high-data availability in cloud storage
JP2012155323A|2012-08-16|System and method for netbackup data decryption in high-latency and low-bandwidth environment
US9357004B2|2016-05-31|Reference count propagation
BR112014014336B1|2021-08-17|COMPUTER IMPLEMENTED METHOD TO DYNAMICALLY DISTRIBUTE DATA AND COMPUTER SYSTEM.
US9558374B2|2017-01-31|Methods and systems for securing stored information
TW202029694A|2020-08-01|Systems and methods for efficient and secure processing, accessing and transmission of data via a blockchain network
Xu et al.2015|Data deduplication mechanism for cloud storage systems
Nicolae et al.2015|Discovering and leveraging content similarity to optimize collective on-demand data access to iaas cloud storage
US10893106B1|2021-01-12|Global namespace in a cloud-based data storage system
US10015248B1|2018-07-03|Syncronizing changes to stored data among multiple client devices
Devi et al.2014|Enhanced dynamic whole file de-duplication | for space optimization in private cloud storage backup
US20210318993A1|2021-10-14|Deduplication of encrypted data using multiple keys
US10628379B1|2020-04-21|Efficient local data protection of application data in storage environments
US10613755B1|2020-04-07|Efficient repurposing of application data in storage environments
US11283604B2|2022-03-22|Sharing encrypted data with enhanced security by removing unencrypted metadata
Devi et al.2015|Enhanced Intensive Indexing | De-Duplication for Space Optimization in Private Cloud Storage Backup
US20160352517A1|2016-12-01|Sharing encrypted data with enhanced security
Vangoor2016|To FUSE or not to FUSE? Analysis and Performance Characterization of the FUSE User-Space File System Framework
同族专利:
公开号 | 公开日
JP2016001480A|2016-01-07|
JP2014508990A|2014-04-10|
AU2012205443A1|2013-08-22|
CN103403712A|2013-11-20|
US8909657B2|2014-12-09|
US20150095385A1|2015-04-02|
KR101573995B1|2015-12-02|
US9305008B2|2016-04-05|
CN103403712B|2016-10-26|
MX2013008194A|2013-08-21|
US20120185448A1|2012-07-19|
JP6199931B2|2017-09-20|
AU2012205443B2|2015-09-03|
WO2012097217A1|2012-07-19|
KR20130120516A|2013-11-04|
EP2663938A1|2013-11-20|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US4807182A|1986-03-12|1989-02-21|Advanced Software, Inc.|Apparatus and method for comparing data groups|
EP0541281B1|1991-11-04|1998-04-29|Commvault Systems, Inc.|Incremental-computer-file backup using signatures|
WO1996025801A1|1995-02-17|1996-08-22|Trustus Pty. Ltd.|Method for partitioning a block of data into subblocks and for storing and communicating such subblocks|
CN1115811C|1995-10-24|2003-07-23|皇家菲利浦电子有限公司|System for transferring data in re-assignable groups, transmitter and receiver for use in such a system, and method for transferring, transmitting and receiving such data|
JPH09185570A|1995-12-27|1997-07-15|Nippon Telegr & Teleph Corp <Ntt>|Method and system for acquiring and reproducing multimedia data|
JP2000339279A|1999-05-28|2000-12-08|Matsushita Electric Ind Co Ltd|Video distribution cache device and video collection reproducer|
US20020188728A1|2001-06-07|2002-12-12|Acceleration Softwave International Corporation|Adaptive file transfer acceleration|
CN1139883C|2001-08-20|2004-02-25|北京九州计算机网络有限公司|Intelligent compression method for file of computer|
JP2005275706A|2004-03-24|2005-10-06|Canon Inc|Information processor and its method, program|
US7555531B2|2004-04-15|2009-06-30|Microsoft Corporation|Efficient algorithm and protocol for remote differential compression|
US7269689B2|2004-06-17|2007-09-11|Hewlett-Packard Development Company, L.P.|System and method for sharing storage resources between multiple files|
US20070067332A1|2005-03-14|2007-03-22|Gridiron Software, Inc.|Distributed, secure digital file storage and retrieval|
KR100969768B1|2006-06-27|2010-07-13|삼성전자주식회사|Apparatus and method for scheduling data in a communication system|
JP4805786B2|2006-10-26|2011-11-02|株式会社東芝|Data transfer system and data transmission apparatus|
US7840537B2|2006-12-22|2010-11-23|Commvault Systems, Inc.|System and method for storing redundant information|
US20080281836A1|2007-02-06|2008-11-13|Access Systems Americas, Inc.| system and method for displaying and navigating content on a electronic device|
FI122554B|2007-02-09|2012-03-15|Google Inc|Method and arrangement for content prioritization|
US7827137B2|2007-04-19|2010-11-02|Emc Corporation|Seeding replication|
US20090049260A1|2007-08-13|2009-02-19|Upadhyayula Shivarama Narasimh|High performance data deduplication in a virtual tape system|
US7519635B1|2008-03-31|2009-04-14|International Business Machines Corporation|Method of and system for adaptive selection of a deduplication chunking technique|
US20090292838A1|2008-05-20|2009-11-26|Ling Jun Wong|Simplified data transfer using intermediary|
US8788466B2|2008-08-05|2014-07-22|International Business Machines Corporation|Efficient transfer of deduplicated data|
US8626723B2|2008-10-14|2014-01-07|Vmware, Inc.|Storage-network de-duplication|
US9015209B2|2008-12-16|2015-04-21|Sandisk Il Ltd.|Download management of discardable files|
CN101447207B|2008-12-30|2012-02-15|华为终端有限公司|Media recording method and device thereof|
US8009567B2|2009-02-05|2011-08-30|Cisco Technology, Inc.|System and method for improved data transmission reliability over a network|
US8731190B2|2009-06-09|2014-05-20|Emc Corporation|Segment deduplication system with encryption and compression of segments|
US8392832B2|2010-02-05|2013-03-05|Research In Motion Limited|Display placeholders for rich media content|
US20110196854A1|2010-02-05|2011-08-11|Sarkar Zainul A|Providing a www access to a web page|
US8970500B2|2010-02-26|2015-03-03|Blackberry Limited|System and method for extracting content from a data item to separately display portions of such data|US8583818B2|2011-01-31|2013-11-12|Cbs Interactive Inc.|System and method for custom segmentation for streaming video|
CN103917960A|2011-08-19|2014-07-09|株式会社日立制作所|Storage apparatus and duplicate data detection method|
US8918375B2|2011-08-31|2014-12-23|Microsoft Corporation|Content aware chunking for achieving an improved chunk size distribution|
US9240073B2|2011-11-15|2016-01-19|Pixar|File format for representing a scene|
US9418072B2|2013-03-04|2016-08-16|Vmware, Inc.|Cross-file differential content synchronization|
WO2014137938A1|2013-03-04|2014-09-12|Vmware, Inc.|Cross-file differential content synchronization|
US9069677B2|2013-04-29|2015-06-30|International Business Machines Corporation|Input/output de-duplication based on variable-size chunks|
US20140365459A1|2013-06-08|2014-12-11|Apple Inc.|Harvesting Addresses|
US10282075B2|2013-06-24|2019-05-07|Microsoft Technology Licensing, Llc|Automatic presentation of slide design suggestions|
US9509747B2|2014-01-23|2016-11-29|Dropbox, Inc.|Content item synchronization by block|
US9639549B2|2014-01-24|2017-05-02|International Business Machines Corporation|Hybrid of proximity and identity similarity based deduplication in a data deduplication system|
KR20160041398A|2014-10-07|2016-04-18|삼성전자주식회사|Contents processing apparatus and contents processing method thereof|
US20160381177A1|2015-06-26|2016-12-29|Qualcomm Incorporated|Managing data requests|
WO2017022034A1|2015-07-31|2017-02-09|富士通株式会社|Information processing device, information processing method, and information processing program|
KR102350765B1|2015-09-11|2022-01-12|삼성에스디에스 주식회사|System and method for managing data de-duplication dictionary|
US9824291B2|2015-11-13|2017-11-21|Microsoft Technology Licensing, Llc|Image analysis based color suggestions|
US10528547B2|2015-11-13|2020-01-07|Microsoft Technology Licensing, Llc|Transferring files|
US10534748B2|2015-11-13|2020-01-14|Microsoft Technology Licensing, Llc|Content file suggestions|
US11030156B2|2015-12-28|2021-06-08|Sandisk Technologies Llc|Key-value store with partial data access|
WO2017140381A1|2016-02-19|2017-08-24|Nec Europe Ltd.|Method for storing data on a storage entity|
WO2017203322A1|2016-05-23|2017-11-30|Telefonaktiebolaget Lm Ericsson |Obscured retrieval sequence for information centric networkingencoded video streams|
KR102010414B1|2017-12-13|2019-08-14|한국과학기술원|Prefetching based cloud broker apparatus for live streaming and method thereof|
IT201800005409A1|2018-05-16|2019-11-16|SYSTEM, EQUIPMENT AND METHOD FOR EFFICIENTLY MANAGING THE MEMORY OF ELECTRONIC DEVICES|
US11262927B2|2019-07-30|2022-03-01|Sony Interactive Entertainment LLC|Update optimization using feedback on probability of change for regions of data|
US11140029B1|2020-10-30|2021-10-05|Nutanix, Inc.|Server side filtering in hybrid cloud environments|
法律状态:
2020-11-10| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2021-03-09| B07A| Application suspended after technical examination (opinion) [chapter 7.1 patent gazette]|
2021-07-06| B07A| Application suspended after technical examination (opinion) [chapter 7.1 patent gazette]|
2021-11-16| B09B| Patent application refused [chapter 9.2 patent gazette]|
2021-12-07| B350| Update of information on the portal [chapter 15.35 patent gazette]|
2022-02-01| B09B| Patent application refused [chapter 9.2 patent gazette]|Free format text: MANTIDO O INDEFERIMENTO UMA VEZ QUE NAO FOI APRESENTADO RECURSO DENTRO DO PRAZO LEGAL |
优先权:
申请号 | 申请日 | 专利标题
US201161433152P| true| 2011-01-14|2011-01-14|
US61/433,152|2011-01-14|
US13/250,504|US8909657B2|2011-01-14|2011-09-30|Content based file chunking|
US13/250,504|2011-09-30|
PCT/US2012/021191|WO2012097217A1|2011-01-14|2012-01-13|Content based file chunking|
[返回顶部]